Skip to content

Conversation

@VDFaller
Copy link
Contributor

@VDFaller VDFaller commented Oct 29, 2025

Summary

Added a tool for parsing the manifest to get the lineage.

What Changed

Just parses then reads the manifest.

Why

So that it's usable by non-cloud customers.

Checklist

  • I have performed a self-review of my code
  • I have made corresponding changes to the documentation (in https://github.com/dbt-labs/docs.getdbt.com) if required -- Mention it here
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Notes

There are some differences in the outputs. I can try to line them up if needed.

Prompt Call & Response

  • "Get me the lineage of this model"
    • Bad - doesn't use mcp
  • "use the dbt mcp server to tell me the full lineage of this model"
    • Good - calls both directions, recursively
  • "use the dbt mcp server to tell me the lineage of this model"
    • Bad - ran jq on the manifest
    • 2nd run - Fine - calls both directions recursively
  • "use the dbt mcp server to get me the children of this model, including tests"
    • Good - Gets non-recursive children (used the name not the unique_id) and had the tests included.

@VDFaller VDFaller requested review from a team, b-per and jasnonaz as code owners October 29, 2025 18:06
@VDFaller VDFaller marked this pull request as draft October 29, 2025 20:39
@VDFaller VDFaller force-pushed the model-lineage-cli branch 2 times, most recently from a48b7a2 to 76ac932 Compare October 30, 2025 17:27
@b-per
Copy link
Collaborator

b-per commented Oct 31, 2025

Thanks Vince!

I am wondering if we should not create new tools instead of making get_model_parents/get_model_children be able to either query the Metadata API or the local artefacts.

Here are my Pros/Cons of having new tools for get_model_parents/children dedicated to the CLI/manifest

  • Pros
    • it would be easy to activate those along the rest of the CLI tools and deactivate the dbt platform ones if needed (just activating the CLI toolset)
    • people could query both the get_model_parents from the metadata API and the new local tool in a single LLM session/context to compare the changes that they are introducing
    • it feels simpler to understand what tool does what and easier to know which ones someone might want to activate/deactivate
  • Cons
    • we already have many tools and it would add more (but realistically most people shouldn't activate all tools and tweak those to their use case)
    • we'd need to find good names and descriptions to explain to the LLM the difference between the children from metadata and the children from manifest if we want to avoid the LLM to get confused

So, I am in the camp of adding this functionality but in new tools ; and I'd be keen to hear other people's opinions about it.


As a side note, would you be able to set up signed commits for this repo? We can bypass this check at the PR level as repo admins, but this repo expects all commits to be signed now.

@VDFaller
Copy link
Contributor Author

VDFaller commented Nov 5, 2025

@b-per Crap, that's how I originally had it (shouldn't have squashed 😢 )

I didn't like it because when I was trying it out, exactly like you pointed to, it seemed to arbitrarily pick which tool it used. So I could run very similar queries and it would give two different results. We could give better names/descriptions so the tool wouldn't get confused but I don't think the user would necessarily know if they were getting the answer they expected.

  • me
    • Get me the model parents for jaffle_shop.orders.
  • MCP
    • Okay there we've got some options, do you want production parents, or local parents, recusive or not?
  • me
    • What?

I think it would just run the tool and the user would see it asking to run get_model_lineage(...) and go "that seems right", without knowing the nuance.

If it were to be two separate tools, would you think it should be a cli tool or a discovery tool. My entire thought process was "Discovery is NOT just platform", especially after listening to Jason's talk at Coalesce where he talked about them abstractly. This very much relates to #418 in my head.

Rebased with gpgSign on, no idea why it was set to false for this repo.

Also on this

we already have many tools and it would add more (but realistically most people shouldn't activate all tools and tweak those to their use case)

  • Do y'all have data to show that's the case? I'd expect people to just give it everything.

@DevonFulcher
Copy link
Collaborator

Hey @VDFaller and @b-per, I'm sorry for the back-and-forth on this. I told Vince that I was appreciative of sticking with the existing get_model_parents/get_model_children tools and routing between the local or remote version depending on the user's config. Let's get aligned on this. I think the Pros you listed are valid, Benoit, but they may not be the features worth optimizing greatly for.

I like the router approach because the agent typically doesn't care whether the information is coming from a local or remote source; the user cares more about that. Also, with the latest config changes, it is quite easy to point the agent to local or remote. If DBT_HOST is present, use GQL; otherwise, use the local version. Turning on/off more tool options depending on local or remote usage is more flexible, but it is also more complex, and I don't think most users want to use both local and remote at the same time.

Furthermore, this router approach can be applied to the Semantic Layer tools in the future. It is a source of frustration for some users that these tools don't work locally.

Add a fallback path for Discovery tools to get use CLI functionality
Add ModelLineage type with main constructor `from_manifest`

The CLI path will not work until auto-disable is functioning correctly.
@VDFaller VDFaller marked this pull request as ready for review November 7, 2025 16:35
@VDFaller VDFaller changed the title CLI fallback for get_model_parents/get_model_children discovery tools Add get_model_lineage_dev CLI tool Nov 7, 2025
@VDFaller VDFaller requested a review from a team as a code owner December 3, 2025 20:40
@DevonFulcher DevonFulcher requested a review from Copilot December 4, 2025 23:43
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new get_model_lineage_dev CLI tool that enables non-cloud customers to retrieve model lineage information by parsing the local development manifest, supporting upstream, downstream, or bidirectional lineage queries with optional recursive traversal and test filtering.

Key Changes:

  • Implemented ModelLineage data model with support for recursive parent/child traversal and cycle detection
  • Added get_model_lineage_dev tool to the dbt CLI toolset with configurable direction, recursion, and exclusion options
  • Updated existing discovery prompts to clarify distinction between production and development lineage tools

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/dbt_mcp/dbt_cli/models/lineage_types.py New module implementing ModelLineage, Ancestor, and Descendant models with manifest parsing logic
src/dbt_mcp/dbt_cli/tools.py Added get_model_lineage_dev function and _get_manifest helper to load and parse manifest.json
tests/unit/dbt_cli/test_model_lineage.py Comprehensive test coverage for lineage parsing with various scenarios
src/dbt_mcp/tools/tool_names.py Registered GET_MODEL_LINEAGE_DEV tool name
src/dbt_mcp/tools/toolsets.py Added new tool to DBT_CLI toolset
src/dbt_mcp/tools/policy.py Defined tool policy as METADATA behavior
src/dbt_mcp/prompts/dbt_cli/get_model_lineage_dev.md Documentation for the new tool with usage examples
src/dbt_mcp/prompts/discovery/get_model_parents.md Clarified this tool is for production manifest
src/dbt_mcp/prompts/discovery/get_model_children.md Clarified this tool is for production manifest
README.md Added get_model_lineage_dev to CLI tools list
.changes/unreleased/Enhancement or New Feature-20251203-203944.yaml Changelog entry for the feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +193 to +199
def get_model_lineage_dev(
model_id: str,
direction: Literal["parents", "children", "both"] = "both",
exclude_prefixes: tuple[str, ...] = ("test.", "unit_test."),
*,
recursive: bool,
) -> dict[str, Any]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: I think this should have essentially the same function signature as the other lineage tool: https://github.com/dbt-labs/dbt-mcp/pull/461/files#diff-6d91f0721d8dcde8199de504338811a7063757ec13f32eca508bfbc8b663a54bR390-R396

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make a separate PR once they're both in to align them. Cool?

import json

_run_dbt_command(["parse"]) # Ensure manifest is generated
cwd_path = config.project_dir if os.path.isabs(config.project_dir) else None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: when the server starts up, we should make all paths absolute if they aren't already.

]
else:
# Build nested descendant trees. Prevent cycles using path tracking.
def _build_descendant(node_id: str, path: set[str]) -> Descendant:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we could consolidate the two versions of this function into one. The differences are minimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to do some funky casting to make mypy happy. let me know if you feel like it's better.

refactor ModelLineage.from_manifest for readability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants